Search CORE

13 research outputs found

Optimal Locally Repairable Codes and Connections to Matroid Theory

Author: Dimakis Alexandros G.
Papailiopoulos Dimitris S.
Tamo Itzhak
Publication venue
Publication date: 06/11/2013
Field of study

Petabyte-scale distributed storage systems are currently transitioning to erasure codes to achieve higher storage efficiency. Classical codes like Reed-Solomon are highly sub-optimal for distributed environments due to their high overhead in single-failure events. Locally Repairable Codes (LRCs) form a new family of codes that are repair efficient. In particular, LRCs minimize the number of nodes participating in single node repairs during which they generate small network traffic. Two large-scale distributed storage systems have already implemented different types of LRCs: Windows Azure Storage and the Hadoop Distributed File System RAID used by Facebook. The fundamental bounds for LRCs, namely the best possible distance for a given code locality, were recently discovered, but few explicit constructions exist. In this work, we present an explicit and optimal LRCs that are simple to construct. Our construction is based on grouping Reed-Solomon (RS) coded symbols to obtain RS coded symbols over a larger finite field. We then partition these RS symbols in small groups, and re-encode them using a simple local code that offers low repair locality. For the analysis of the optimality of the code, we derive a new result on the matroid represented by the code generator matrix.Comment: Submitted for publication, a shorter version was presented at ISIT 201

arXiv.org e-Print Archive

CiteSeerX

A Repair Framework for Scalar MDS Codes

Author: Caire Giuseppe
Dimakis Alexandros G.
Papailiopoulos Dimitris S.
Shanmugam Karthikeyan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2013
Field of study

Several works have developed vector-linear maximum-distance separable (MDS) storage codes that min- imize the total communication cost required to repair a single coded symbol after an erasure, referred to as repair bandwidth (BW). Vector codes allow communicating fewer sub-symbols per node, instead of the entire content. This allows non trivial savings in repair BW. In sharp contrast, classic codes, like Reed- Solomon (RS), used in current storage systems, are deemed to suffer from naive repair, i.e. downloading the entire stored message to repair one failed node. This mainly happens because they are scalar-linear. In this work, we present a simple framework that treats scalar codes as vector-linear. In some cases, this allows significant savings in repair BW. We show that vectorized scalar codes exhibit properties that simplify the design of repair schemes. Our framework can be seen as a finite field analogue of real interference alignment. Using our simplified framework, we design a scheme that we call clique-repair which provably identifies the best linear repair strategy for any scalar 2-parity MDS code, under some conditions on the sub-field chosen for vectorization. We specify optimal repair schemes for specific (5,3)- and (6,4)-Reed- Solomon (RS) codes. Further, we present a repair strategy for the RS code currently deployed in the Facebook Analytics Hadoop cluster that leads to 20% of repair BW savings over naive repair which is the repair scheme currently used for this code.Comment: 10 Pages; accepted to IEEE JSAC -Distributed Storage 201

arXiv.org e-Print Archive

Crossref

Locality and Availability in Distributed Storage

Author: Dimakis Alexandros G.
Papailiopoulos Dimitris S.
Rawat Ankit Singh
Vishwanath Sriram
Publication venue
Publication date: 01/01/2014
Field of study

This paper studies the problem of code symbol availability: a code symbol is said to have

(r, t)

-availability if it can be reconstructed from

t

disjoint groups of other symbols, each of size at most

r

. For example,

3

-replication supports

(1, 2)

-availability as each symbol can be read from its

t= 2

other (disjoint) replicas, i.e.,

r=1

. However, the rate of replication must vanish like

\frac{1}{t+1}

as the availability increases. This paper shows that it is possible to construct codes that can support a scaling number of parallel reads while keeping the rate to be an arbitrarily high constant. It further shows that this is possible with the minimum distance arbitrarily close to the Singleton bound. This paper also presents a bound demonstrating a trade-off between minimum distance, availability and locality. Our codes match the aforementioned bound and their construction relies on combinatorial objects called resolvable designs. From a practical standpoint, our codes seem useful for distributed storage applications involving hot data, i.e., the information which is frequently accessed by multiple processes in parallel.Comment: Submitted to ISIT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

MLSys: The New Frontier of Machine Learning Systems

Author: Alistarh Dan
Alons Gustavo
Andersen David G
Bailis Peter
Bird Sarah
Carlini Nicholas
Catanzaro Bryan
Chayes Jennifer
Chung Eric
Dally Bill
De Sa Christopher
Dean Jeff
Dhillon Inderjit S
Dimakis Alexandros
Dubey Pradeep
Elkan Charles
Fursin Grigori
Ganger Gregory R
Getoor Lise
Gibbons Phillip B
Gibson Garth A
Gonzalez Joseph E
Gottschlich Justin E
Han Song
Hazelwood Kim
Huang Furong
Jaggi Martin
Jamieson Kevin
Jordan Michael I
Joshi Gauri
Khalaf Rania
Knight Jason
Konecny Jakub
Kraska Tim
Kumar Arun
Kyrillidis Anastasios
Lakshmiratan Aparna
Li Jing
Madden Samuel
McMahan H B
Meijer Erik
Mitliagkas Ioannis
Monga Rajat
Murray Derek
Olukotun Kunle
Papailiopoulos Dimitris
Pekhimenko Gennady
Ratner Alexander
Re Christopher
Rekatsinas Theodoros
Rostamizadeh Afshin
Sedghi Hanie
Sen Siddhartha
Smith Virginia
Smola Alex
Song Dawn
Sparks Evan
Stoica Ion
Sze Vivienne
Talwalkar Ameet
Udell Madeleine
Vanschoren Joaquin
Venkataraman Shivaram
Vinayak Rashmi
Weimer Markus
Wilson Andrew G
Xing Eric
Zaharia Matei
Zhang Ce
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two

arXiv.org e-Print Archive

ScholarlyCommons@Penn